Golang Job: Backend Developer - Go (Compute Team)

Job added on

Company

Nebius

Location

Amsterdam - Netherlands

Job type

Full-Time

Golang Job Details

Who are we?


Nebius is a modern technology venture that enables country-scale B2B companies to build their own local hyperscaling cloud platforms, connecting businesses, tech communities, and public organizations. We provide not only the technologies, but also a launch-ready business model that includes customizable tools for support, sales, and marketing.

Innovation is in our DNA. We design and build server racks for the data centers to make sure they can work under any load, heavily investing in data management, machine translation, and speech recognition so our partners can enhance their own IT infrastructure and provide cutting-edge cloud solutions for their markets. In compliance with ISO and GDPR, we carry out in-depth security checks to ensure the highest level of data sovereignty.

Our team

Compute is the service responsible for creating, monitoring and managing all virtual machines in our clouds. These virtual machines are the basic computational 'bricks' our clients (both internal and external) use to build their solutions. We aim to provide all the key benefits of virtualization, such as isolation, resilience and efficient hardware utilization.

Compute is one of the most fundamental services of our Cloud product. Almost all our internal services and all client-facing services are build on top of Compute. Thus, we prioritize for stability and reliability a lot. We carefully build our service in a way that allows it to keep operating even if some key cloud or hardware components are down.

The service structure can be seen at three layers: Compute API, Compute Node and Scheduler.

  • Compute API is the enrty point for all virtual machine management requests. They can be internal or external, manually or automatically generated. We use microservice architecture to execute these VM management tasks in a distributed and resilient way.
  • Compute Node works with virtual machines directly on hardware. It creates and configures new virtual machines, adds all necessary resources such as CPUs, RAM, GPUs, virtual disks, etc. Compute Node is also responsible for communicating to already started virtual machines, monitoring them and performing live migration to another physical host.
  • Scheduler is the smart brain of our service. It oversees the whole cluster and monitors its state and health. The key goals of scheduler are to keep track of all the resources, satisfy as many resource allocation requests as possible, and ensure the efficient usage of hardware for business optimality.

Key technologies we use

  • Programming language: Go
  • YDB, gRPC
  • QEMU Machine Protocol and Linux system calls for managing VMs
  • Deploy with Salt and Kubernetes

What you will do

Though Compute is one of the oldest services in our Cloud, almost all of its components undergo an extensive development. The demand for clouds capabilities and capacities is growing, constantly forcing us to face new challenges of scalability and efficiency.Our tasks can be split in the following groups.

  • Day to day operations that help our service properly function. That includes finding and fixing bugs, dashboard and logs analytics, adding new resources. Right now the operations burden is high and can peak to up to 50% of engineering capacity of our team. It is one of our top priorities now to develop a toolset for easing and automating operations in order to dramaticaly reduce this share.
  • Upgrading the logic of VMs managements. That include working with data base (YDB), gRPC and REST, designing and implementing management tasks in a way they can keep running after disaster recovery.
  • Working with VMs directly at a physical machine. That includes using Linux system calls and process management, communicating with already running VMs using QEMU Machine Protocol.
  • Deploy tasks. That includes bootstrap and upgrade scenarios, working with Salt and Kubernetes.
  • Developing auxiliary tools for monitoring, diagnosis and repair.
  • Algorithmic optimization of cluster management and resource allocation. Cluster simulator for testing and experiments.

Qualifications:

  • 3+ years of professional software development experience;
  • Excellent knowledge of Golang;
  • Ability to write reliable code and dig into complex problems;
  • Teamwork-oriented approach.

Preferred qualifications:

  • Experience designing high-load and scaling services.